Method, computer program and system for controlling a plurality of robots, and computer-readable storage medium

ABSTRACT

A method for controlling a plurality of agents to complete a mission, including deriving a decomposition set of decomposition states in a set of possible states of an automaton, wherein the automaton characterizes the mission, deriving a sequence of actions to be carried out by the plurality of agents depending on the decomposition set, where each action is to be carried out by at most one of the plurality of agents.

CROSS REFERENCE

The present application claims the benefit under 35 U.S.C. § 119 ofGerman Patent Application No. DE 102017215311.3 filed on Sep. 1, 2017,which is expressly incorporated herein by reference in its entirety.

FIELD

The present invention relates to a method for controlling a plurality ofrobots, a computer program and a system configured to carry out themethod and a computer-readable storage medium.

BACKGROUND INFORMATION

Linear Temporal Logic (LTL) is a mathematical specification logic whichis able to capture temporal relationships. It originally results fromthe field of model checking and verification.

SUMMARY

An example method in accordance with the present invention may have theadvantage that it automatically generates optimal action-level behaviorfor a team of robots (or agents). In accordance with the presentinvention, separable tasks are optimally allocated to the availableagents or robots, while avoiding the need of computing a combinatorialnumber of possible assignment costs, where each computation would itselfrequire solving a complex planning problem, thus improving computationalefficiency, in particular for on-demand missions where task costs areunknown in advance.

Further advantageous aspects are described herein.

LTL can be applied to robotic behavior planning. It then provides aformalism to specify an expected behavior in an unambiguous way. Assuch, an LTL specification can be used to describe a result of theexpected behavior, while the way to achieve this result can beautomatically derived by the system.

An LTL formula ϕ can be defined over a set of atomic propositions Π. Asingle atomic proposition is notated π∈Π. Each atomic proposition can beeither true

or false

. To express temporal relationships, the semantics of the formula ϕ canbe defined over a sequence σ of propositions. Conveniently, the sequenceσ is defined as a function of time index t, and σ(t)⊆Π for each t.

A proposition may be expressed in terms of concatenations of atomicproposition using the Boolean operators and (“∧”) and/or or (“∨”).

Boolean operators

(“not”) and ∧ (“and”) and temporal operators

(“next),

(“until”) and

(“release”) can be used to recursively define a satisfaction relation

as follows:

-   -   σ(t)        π iff π∈σ(t)    -   σ(t)        ϕ₁ iff        (σ(t)        ϕ₁)    -   σ(t)        ϕ₁ ∧ϕ₂ iff σ(t)        ϕ₁ ∧σ(t)        ϕ₂    -   σ(t)        ϕ₁ iff σ(t+1)        ϕ₁    -   σ(t)        ϕ₁        ϕ_2 iff ∃t₂≥t such that σ(t₂)        ϕ₂ and ∀t_(i)∈[t,t₂) it holds that σ(t₁)        ϕ₁    -   σ(t)        ϕ₁        ϕ₂ iff t₁=∞ or ∃t₁≥t such that σ(t₁)        ϕ₁ and ∀t₂ ∈[t, t₂) it holds that σ(t₂)        ϕ₂.

A Non-deterministic finite automaton

is characterized by a tuple

=(Q, Q₀,α, δ,F) consisting of

-   -   a set of states Q,    -   a set of initial states Q₀⊆Q,    -   a set of Boolean formulas α over the set of atomic propositions        Π,    -   transition conditions δ:Q×Q→α, and    -   a set of accepting (final) states F⊆Q.

Note that the term nondeterministic finite automaton

is used in the broad sense that also encompasses deterministic finiteautomata, i.e., every deterministic finite automaton is also anondeterministic finite automaton in this sense.

For two states q_(i),q_(j)∈Q, the absence of a transition between thesetwo states is denoted by δ(q_(i),q_(j))=

. Accordingly, there exists a transitions between these two states ifδ(q_(i),q_(j))≠

, and the Boolean formula δ(q_(i),q_(j)) denotes the transitioncondition.

A sequence σ over propositions when applied to the nondeterministicfinite automaton

describes a sequence of states q∈Q, called a run ρ:

∪{0}→Q. The run ρ is called feasible if it starts in an initial stateρ(0)=q₀ with q₀∈Q₀ and if all transition conditions are satisfied alongthe run σ(t)

δ(ρ(t−1),ρ(t)) for all t. A run ρ is called accepting if it is feasibleand ends in an accepting state q_(n)∈F. Sequence σ is called to violatethe specification if it does not describe a feasible run.

If sequence σ describes a feasible but not an accepting run, it does notsatisfy the specification. If sequence σ forms a prefix of an acceptingrun and can be extended to a sequence satisfying the specification, itis said that σ partially satisfies ϕ.

A given mission

that is to be completed by a set of agents can be expressed in terms ofan LTL formula ϕ or equivalently in terms of a nondeterministic finiteautomaton

. It may be given as a set of tasks

={

, . . . ,

}. The tasks

are independent parts of the mission that can be allocated to theagents. The above-mentioned set of tasks is called a decomposition ofthe mission

. This implies two decomposition properties which are fulfilled by alltasks. The tasks have to be mutually independent, i.e. execution ornon-execution of a first task

must not violate a second task

. Furthermore, completion of each of the tasks

, . . . ,

implies completion of the mission

.

This enables acting agents to act independently, without anycoordination, and execution does not have to be synchronized between theagents.

A task

may be specified by an LTL formula ϕ^((i)) or a nondeterministic finiteautomaton

^((i)). The conditions of mutual independence and completeness can beexpressed by saying that any strategy that satisfies each LTL formulaϕ^((i)) that specifies task

∈

for a strict subset of tasks

⊂{

, . . . ,

} partially satisfies the LTL formula ϕ that specifies the mission

.

Consequently, completing the subset of tasks

can be associated with reaching a certain state in the nondeterministicfinite automaton

that also specifies the mission

. However, not every state implies completion of a set of tasks whenrequiring the above properties.

Therefore, a first aspect of the invention makes us of a decompositionset

of the nondeterministic finite automaton

that specifies the mission

. The decomposition set

contains all states q which can be associated with completing the subsetof tasks

which is a subset of the decomposition {

, . . . ,

} of the mission

.

Based on the decomposition set

, a team model that can be augmented to contain all possibledecomposition choices. This team model can then be used for efficientlyplanning an optimal decomposition and a corresponding allocation oftasks to agents. It can also be used for at the same time planningaction sequences to execute the mission.

To make the relation between formula ϕ^((i)) that specifies task

and the LTL formula ϕ that specifies the mission

clear, we let {ϕ^((i))} with i=1, . . . , n be a set of finite LTLspecifications for the tasks and {

} and {σ_(i)} denote sequences that satisfy the tasks {ϕ^((i))}, i.e.σ_(i)

ϕ^((i))∀i∈{1, . . . , n}. The tasks {

} are a decomposition of the mission

if and only if σ_(j) ₁ . . . σ_(j) _(i) . . . σ_(j) _(n)

ϕ for all permutations of j_(i)∈{1, . . . , n} and all respectivesequences σ_(i). If tasks {

} are a decomposition of the mission

, they fulfill the decomposition properties of independence andcompleteness regarding the mission

.

The several aspects of the present invention avoid the need of computinga combinatorial number of possible assignment costs, where eachcomputation would itself require solving a complex planning problem,thus improving computational efficiency, in particular for on-demandmissions where task costs are unknown in advance.

Therefore, in the first aspect, the present invention includes a methodfor controlling a plurality of agents to complete the mission

, comprising the steps of:

-   -   deriving the decomposition set        of decomposition states in the set of possible states Q of the        automaton        , wherein the automaton        characterizes the mission        ,    -   deriving a sequence β_(fin) of actions (a₁,a₂, . . . , a_(n)) to        be carried out by the plurality of agents depending on the        decomposition set        , where each action (a₁,a₂, . . . , a_(n)) is to be carried out        by at most one of the plurality of agents.

Preferably, the method may further comprise the step of controlling theplurality of agents in accordance with the derived sequence β_(fin) ofactions (a₁,a₂, . . . , a_(n)).

In another aspect of the present invention, the method further comprisesthe step of generating the decomposition set

by exploring an essential sequence σ_(e) of an accepting run ρ_(i)through one or more candidate decomposition states q_(i).

Preferably, this method further comprises the step of adding the one ormore candidate decomposition state q_(i) to the decomposition set

depending on whether a complementary sequence {circumflex over (σ)}_(e)to the explored essential sequence σ_(e) around the respective one ormore candidate decomposition state q_(i) is accepting.

Even more preferably, the decomposition set

consists of all those states q_(i) in the set of possible states Q ofthe automaton

, for which the complementary sequence {circumflex over (σ)}_(e) to theexplored essential sequence σ_(e) around the respective state q_(i) isaccepting.

In another aspect of the present invention, the method further comprisesthe step of generating a team model

based on the automaton

that characterizes the mission

and based on automata

^((r)) that each characterize the capabilities of one of the pluralityof agents.

Preferably, it may be envisaged that the team model

comprises a set of actions

that comprises switch transitions

which change the acting agent from one of the plurality of agents toanother one of the plurality of agents.

That is, individual agents are assumed to act independently and based onthe decomposition set, special transitions (the switch transitions

) indicate the options to split the mission at some state and allocatethe rest to a different agent. In other words, the switch transitions

are purely virtual transitions that by themselves do not lead to anyactions of the agents.

More preferably, these the switch transitions

are configured to each change the acting agent from one of the pluralityof agents to a next one of the plurality of agents. This is particularlyuseful because it implies that, starting in a state associated with afirst agent r, no state associated with an agent r′<r can be reached byany path in the team model.

As indicated above, preferably the switch transitions

are configured such as to only act if the automaton

is in a decomposition state.

In another aspect of the present invention, the method further comprisesthe step of deriving the sequence β_(fin) of actions (a₁, a₂, . . .a_(n)) to be carried out by the plurality of agents by a label-settingalgorithm in which each state s of a set of states

of the team model

is associated with labels l that are characterized by a sequence β ofaction leading to the respective state s. That is, the label-settingalgorithm searches for a final label l_(fin). Finding the final labell_(fin) is equivalent to finding the respective sequence β_(fin) ofactions that satisfies the mission.

Preferably, this method further comprises the step of constructing areachable set of temporary labels L_(t,s) for each state s and a set ofpermanent labels L_(p,s).

Even more preferably, this method further comprises the step ofconstructing, for each selected label l*, a set V of consecutive labelsv by extending an action sequence β associated to the selected label l*by all available actions a and adding the resulting labels l_(v) to thereachable set of temporary labels L_(t,s).

Preferably, each label l comprises at least one component thatcharacterizes a cost ĉ_(β) under the corresponding sequence β of actionsa.

Even more preferably, it may be envisaged that the derived sequenceβ_(fin) of actions (a₁,a₂, . . . , a_(n)) to be carried out by theplurality of agents is the one out of all actions that satisfy acharacterization ϕ of the mission

that minimizes a team cost {circumflex over (κ)} which depends on thecomponent that characterizes the cost ĉ_(β).

Preferably, only actions a resulting in Pareto-optimal labels l_(v) attheir target state v are added to the reachable set of temporary labelsL_(t,s). This is a very efficient implementation.

In another aspect of the present invention, the component thatcharacterizes the cost ĉ_(β) under the corresponding sequence β ofactions a depends on costs c_(a,r) associated with each of these actionsa with one component each for each one of the agents.

Preferably, the component that characterizes the cost ĉ_(β) under thecorresponding sequence β of actions a is stored in memory by way of adata structure that comprises at least one component c_(β,r) thatcharacterizes costs associated with a selected one of the agents 11, 12,13 and at least one component ∥(c_(β,1), . . . , c_(β,r-1))^(T)∥_(∞),∥(c_(β,1), . . . , c_(β,r-1))^(T)∥₁ that characterizes the costsassociated with a group of agents that precede the selected one of theagents.

This makes use of the surprising fact that, starting in a stateassociated with agent r, no state associated with a preceding agent r′<rcan be reached by any path in the team model

, i.e., no action associated with any r′ will occur in a continuation ofthe corresponding sequence γ.

In another aspect of the present invention, each label l comprises atleast one component that characterizes a resource status γ at therespective state s under the corresponding sequence β of actions.

Preferably, the characterization ϕ of the mission

comprises an inequality constraint that restricts the at least onecomponent that characterizes a resource status γ to a predefined region.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is explained in more detail below with referenceto figures.

FIG. 1 shows a robot control system according to a first aspect of thepresent invention.

FIG. 2 shows a flow-chart diagram that illustrates a preferred methodaccording to a further aspect of the present invention.

FIG. 3 shows a flow-chart diagram which relates to a preferred algorithmto determine the decomposition set

.

FIG. 4 shows a flow-chart diagram which relates to a preferred method toconstruct the team model

.

FIG. 5 shows a flow-chart diagram which relates to a preferred method toplan an optimal action sequence.

FIG. 6 illustrates an example of the structure of the team model.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

FIG. 1 shows a robot control system 10 that is configured to plan andallocate tasks to a plurality of robots 11, 12, 13 such that theplurality of agents, preferably robots 11, 12, 13, by fulfillment oftheir respective tasks, jointly achieve a common goal, thus achieving apredefinable mission

. Robot control system 10 is equipped with communication means (notshown), e.g., a wireless networking transceiver, to communicate witheach of the robots 11, 12, 13 via a communication link 22, 23, 24.Similarly, each of the robots 11, 12, 13 is equipped with correspondingcommunication means.

In a preferred embodiment, the robot control system 10 comprises acomputer 20 with memory 21, on which a computer program is stored, saidcomputer program comprising instructions that are configured to carryout the method according to aspects of the present invention describedbelow if the computer program is executed on the computer 20.

In further aspects of the preferred embodiment, the robots 11, 12, 13comprise a computer 11 b, 12 b, 13 b each, said computer being equippedwith a computer memory each (not shown) on which a computer program isstored, said computer program comprising instructions that areconfigured to carry out some or all of the method according to furtheraspects of the invention described below if the computer program isexecuted on the computer 11 b and/or 12 b and/or 13 b. Preferably, therobots 11, 12, 13 each comprise actuators 11 a, 12 a, 13 a that enableeach of the robots to physically interact with an environment in whichthe robots are placed.

FIG. 2 shows a flow-chart diagram that illustrates a preferred methodaccording to a further aspect of the present invention. In a first step1000 the method receives an agent model

₁,

₂,

₃, . . . each for each of the agents or robots 11, 12, 13. The agentmodels can for example be read from a dedicated location in computermemory 21.

These agent models

₁,

₂,

₃, . . . are preferably each given as an automaton

=(

,

,

,Π,λ) consisting of

-   -   a set of states        that the corresponding agent or robot can be in        -   an initial state            ∈        -   a set of possible actions            ⊆            ×            that the corresponding agent or robot can carry out        -   a set of propositions Π        -   a labeling function λ:            →2^(Π).

Modeling the agent models

₁,

₂,

₃, . . . as an automaton as described is convenient because it isintuitive to model the internal state and the actions of the agents as astate machine. Furthermore, it is convenient to model an abstraction ofplaces in the environment as a topological map.

Independently of step 1000, the method receives a specification of themission

in step 1100. Preferably, this mission specification

is an LTL specification, e.g. a set of tasks {

, . . . ,

}. In a following step 1200, this mission specification

is converted into a nondeterministic finite automaton

. Note that steps 1100 and 1200 are optional. Alternatively, the methodmay directly receive the mission specification

as the nondeterministic finite automaton

. Then, in step 1300, the method determines the decomposition set

depending on the automaton

. A preferred embodiment of this determination procedure is explained indetail in FIG. 3.

Following steps 1000 and 1300, the method constructs a team model

depending on the automaton

, the decomposition set

and the agent models

₁,

₂,

₃ . . . in step 1400. A preferred embodiment of this constructionprocedure is explained in detail in FIG. 4.

In the following step 2000, the method carries out a procedure ofplanning an optimal action sequence β_(fin) based on the team model

which is explained in detail in FIG. 5.

In step 3000, the optimal action sequence β_(fin) is translated intoexecutable commands for the agents or robots 11, 12, 13, for example bymeans of a lookup table that may be stored in computer memory 21. Theexecutable commands are each associated with one of the agents or robots11, 12, 13 and distributed to the respective agent or robot 11, 12, 13via one of the communication links 22, 23, 24. The respective agent orrobot 11, 12, 13 then executes this command and preferably uponcompletion of the command sends a confirmation message to the robotcontrol system 10. In case that the execution of this command is notpossible, the respective agent or robot 11, 12, 13 may send an errornotification to the robot control system 10, which may reactaccordingly. In case it receives a confirmation message that a commandhas been executed, it may send a next command to a next respective agentor robot 11, 12, 13. In the case it receives an error notification, itmay enter a dedicated mode, e.g., a shut-down mode of all agents orrobots 11, 12, 13.

FIG. 3 shows a flow-chart diagram that depicts a further aspect of thepresent invention, which relates to a preferred algorithm to determinethe decomposition set

. This method starts in step 1310 with a step of reading theaforementioned automaton

. An index i that will be used to label all the states {q_(i)}=Q of theset of states Q that is associated with the automaton

. This index is initialized to an initial value, e.g. i=1. Thedecomposition set

is initialized as

=Ø.

Then, in step 1320, the method constructs an accepting run ρ_(i) thatpasses through state q_(i) corresponding to the present value of theindex i. State q_(i) is the candidate decomposition state. Such anaccepting run ρ_(i) may for example be constructed by exploring thegraph defined by the transition conditions δ associated with theautomaton

and constructing a first partial run ρ_(f) from state q_(i) to initialstate q₀ associated with automaton

while considering inverted transitions and a second partial run ρ_(l)from state q_(i) to a final state f∈F associated with automaton

. The accepting run ρ_(i) that passes through q_(i) may then beconstructed by concatenating the inverted first partial run ρ_(f) andthe second partial run ρ_(l).

In the following step 1330, the method generates an essential sequenceσ_(e) associated with the accepting run ρ_(i).

A sequence σ is called essential for nondeterministic finite automaton

and associated with a run ρ if and only if it describes the run ρ in

and σ(t)\{π}

δ(ρ(t−1), ρ(t)) for all t and propositions π∈σ(t), i.e., σ contains onlythe required propositions.

For example, the essential sequence σ_(e) may be generated from theaccepting run ρ_(i) by converting all propositions of correspondingtransition conditions δ(ρ_(i)(t+1),ρ_(i)(t)) of the accepting run ρ_(i)to their respective disjunctive normal form and successively adding allpropositions of each one conjunctive clause to the essential sequenceσ_(e) for all t.

In the following step 1340, the method generates a complementarysequences {circumflex over (σ)}_(e) of the essential sequence σ_(e). Tothis end partial sequences σ₁ and σ₂ are generated with σ₁ being thepart of essential sequence σ_(e) from its initial state to state q_(i)and σ₂ being the remaining part of essential sequence σ_(e) from stateq_(i) to its final state, i.e., essential sequence σ_(e)=σ₁σ₂ is aconcatenation of these two partial sequences σ₁ and σ₂. Thecomplementary sequence {circumflex over (σ)}_(e) is then generated byreversing the order of these two partial sequences σ₁ and σ₂, i.e.,{circumflex over (σ)}_(e)=σ₂σ₁.

Next follows step 1350, in which it is checked whether or not thecomplementary sequence {circumflex over (σ)}_(e) corresponds to anaccepting run (which amounts to a simple iterative check whether thepropositions of the complementary sequence {circumflex over (σ)}_(e)satisfy the transition conditions δ). If it is determined that thecomplementary sequence {circumflex over (σ)}_(e) is an acceptingsequence, the method branches to step 1360 in which state q_(i) is addedto the decomposition set

, after which the method continues with the execution of step 1370. Ifit is determined that the complementary sequence {circumflex over(σ)}_(e) is not an accepting sequence, the method skips directly to step1370.

In step 1370, it is checked whether index i has already iterated overall states q_(i) of set Q, preferably by checking whether i=∥Q∥. If not,the method branches to step 1380 in which index i is incremented by anincrement of 1 and the method continues with a next iteration in step1320. If, however, it is determined that the index i has alreadyiterated over all states q_(i) of set Q, the method branches to step1390, in which this part of the algorithm for determining thedecomposition set

ends.

FIG. 4 shows a flow-chart diagram that describes a preferred embodimentof a still further aspect of the present invention. This aspect relatesa method for constructing the team model

from the nondeterministic finite automaton

, the decomposition set

and the agent models

₁,

₂,

₃ . . . .

The method starts in step 1410, in which the nondeterministic finiteautomaton

, the decomposition set

and the agent models

₁,

₂,

₃, . . . ,

_(n). are received. For ease of notation, in the context of thediscussion of FIG. 4 the models will be labelled with a genericlabelling superscript (r), i.e the agent models will be denoted

^((r)).

Next, in step 1420, a corresponding product model P^((r)) will becreated for every agent model

^((r)). By combining the agent model automaton

^((r)) with the nondeterministic finite automaton

of the mission

, the product model

^((r)) can be constructed to capture both the agent capabilities encodedin the agent model automaton

^((r)) and the specification of the mission

encoded in the nondeterministic finite automaton

. Dropping the superscript (r) for ease of notation, conveniently, theproduct model

may be given by

=

⊗

=(

,

,

) comprising

-   -   a set of states        =Q×    -   a set of initial states        =Q₀×{        }    -   a set of actions A_(p)={((q_(s),s_(s)),(q_(t),s_(t)))∈        ×        :(s_(s),s_(t))∈        ∧λ(s_(s))        δ(q_(s),q_(t)))}.

For a plurality of agents, especially a plurality of robots, therespective agent models may differ from each other, each representingthe capabilities of the respective agent, while the nondeterministicfinite automaton

Y is determined by a particular specification of the mission

. As such, the product model

may be constructed separately for each of the agents. It describes foreach of the different agent how the mission

can be executed by the agent to which the agent model

^((r)) corresponds.

Therefore, in a preferred embodiment for each r∈{1, . . . , N} thecorresponding product model

^((r)) is constructed as

^((r))=

⊗

^((r)) as defined above.

In order to combine a plurality of agents it is possible to construct ateam model automaton

from the individual product models

^((r)). This is done in step 1430.

The team model automaton

is conveniently constructed as a union of all the local product models

^((r)) with r∈{1, . . . , N} as follows: The team model automaton

is constructed as

=(

,

,

,

), comprising

-   -   a set of states        ={(r,q,s):r∈{1, . . . , N}, (q,s)∈        }    -   a set of initial states        ={(r,q,s)∈        :r=1,(q,s)∈        }    -   a set of final states        ={(r,q,s)∈        :q∈F}    -   a set of actions        =∪_(r)        .

In following step 1440 a set of switch transitions

⊂

×

is determined. The set of switch transitions

is defined as the set of all those transitions

=((r_(s),q_(s),s_(s)),(r_(t),q_(t),s_(t))) between a starting state(r_(s),q_(s),s_(s))∈

and a terminal state (r_(t),q_(t),s_(t))∈

which

-   -   connect different agents, i.e. r_(s)≠r_(t),    -   preserve the progress in the nondeterministic finite automaton        , i.e. q_(s)=q_(t),    -   point to the next agent, i.e. r_(t)=r_(s)+1    -   point to an initial agent state, i.e. s_(t)=        , and    -   start in the decomposition set        , i.e. q_(s)∈        .

Conveniently, the set of switch transitions

may be constructed by traversing all states q_(s) in the decompositionset

and all starting agent indices r_(s)={1, . . . , N−1}. For this choiceof state q_(s) and starting agent index r_(s), traversing all statess_(s) for which (q_(s),s_(s))∈

fixes r_(t),q_(t),s_(t) and thus yields the set of switch transitions

.

An example of the structure of the team model

is depicted in FIG. 6, which shows an example of a system comprisingthree agents. The team model

has an initial state (bottom left corner) and three final states (rightside). Between the agent automata, directed switch transitions

to the next agent connect states of the decomposition set

.

In step 1450 following step 1440, the set of switch transitions

is added to the set of actions

, i.e.

→

∪

. This concludes the algorithm shown in FIG. 4. FIG. 5 shows aflow-chart diagram that describes a preferred embodiment of an evenfurther aspect of the invention. This even further aspect of theinvention relates to derive an action sequence β_(fin) which minimize ateam cost κ for given agent models

^((r)) of the team of agents 11, 12, 13, a cost function C, initialresources γ₀≥0 and the specification ϕ of the mission

such that the specification ϕ is satisfied. An action sequence β iscalled satisfying if the associated state sequence σ satisfies thespecification ϕ.

Generally, an action sequence β is preferably defined as β=s₀a₁s₁ . . .a_(n)s_(n) which is a run in

with s_(i)∈

and a_(i)∈

. In order to distribute β among the involved agents,

for agent r is preferably obtained by projecting β onto

^((r)).

Conveniently, the cost function C may be defined as follows. Each actionof the team model

is assigned a non-negative cost, i.e. C:

→

_(≥0). For switch transitions

, preferably the associated cost C(

) is chosen as zero to reflect the fact that switch transitions

are purely virtual and will not appear in the action sequence β^((r))executed by the agents 11, 12, 13.

For modelling the multi-agent character of a cost, it is convenient toextend the cost C(a) associated with an action a∈

to a vector of the same dimensionality N as the number of agents 11, 12,13, i.e. C(a)∈

_(≥0) ^(N) where each agent r=1, . . . , N represents one dimension.

To reflect the fact that each action a with non-zero cost c_(a)=C(a) isassociated with a particular agent by the fact that

\

=∪_(r)

, it is convenient to define

$c_{a,i} = \left\{ \begin{matrix}{{C(a)},{{{if}\mspace{14mu} i} = r}} \\{{0,{otherwise}}\;}\end{matrix} \right.$and

=0. Consequently, the costs c_(β) associated with an action sequence βcan be computed as c_(β)=Σ_(a∈β)c_(a).

Given a set of action sequences, a Pareto front of all cost vectorsc_(β) for satisfying action sequences β then forms a set of potentiallyoptimal solutions. In order to prioritize these solution, in a preferredembodiment one may compute an overall team cost κ asκ(c_(β))=(1−ϵ)∥c_(β)∥_(∞)+ϵ∥c_(β)∥₁, where ϵ∈(0,1] may be chosen fixedbut freely. This conventiently reflects an objective to minimize themaximal agent cost ∥c_(β)∥_(∞), e.g. minimizing a completion time ofmission

, and an objective to avoid unnecessary actions of the agents 11, 12 13via a regularization term ∥c_(β)∥₁.

To save memory requirements for storing the cost vector c_(β),preferably the cost vector c_(β) is stored as a compressed cost vectorĉ_(β) which is three-dimensional, independent of the number of agents,by recursively choosing

$\begin{matrix}{{\hat{c}}_{\beta} = {\begin{pmatrix}\left. ||\left( {c_{\beta,1},\ldots\mspace{14mu},c_{\beta,{r - 1}}} \right)^{T} \right.||_{\infty} \\\left. ||\left( {c_{\beta,1},\ldots\mspace{14mu},c_{\beta,{r - 1}}} \right)^{T} \right.||_{1} \\c_{\beta,r}\end{pmatrix}.}} & (1)\end{matrix}$

This definition exploits the mathematical truth discovered as part ofthe work leading to the invention that given a fixed but arbitrary agentr, the team cost κ of the action sequence β can already be evaluated forall agents r′<r since no action associated with any of these agents r′will occur in a continuation of β.

This makes it possible to simplify the computation of the team cost κ byinstead computing a compressed team cost{circumflex over (κ)}(ĉ _(β))(1−ϵ)∥(ĉ _(β,1) ,ĉ _(β,3))^(T)∥_(∞)+ϵ∥(ĉ_(β,2) ,ĉ _(β,3))^(T)∥₁,  (2)with ĉ_(β,i) denoting the i-th component of the compressed cost vectorĉ_(β). This representation not only removes a dependency of the teamcost c_(β) on the team size N, it also a more efficient representationduring planning. The reason for this efficiency gain is that additionalcost vectors are Pareto-dominated as will be discussed below in thediscussion of step 2100, and can thus be eliminated from the set ofpotential solutions much earlier in the planning process.

Furthermore, in addition to the specification ϕ which allows to modeldiscrete constraints, in an optional further development is possible toconsider constraints of the agents in continuous domains, like forexample constraints on resources γ. A change of resources γ may bemodeled by a resource function Γ:

→

^(M) where M indicates the number of resource dimensions that models thechange of resources γ under a given action a∈

. Conveniently, the resource function can take both negative andpositive values to reflect the fact that resources can be modified inboth directions.

For the action sequence β, the resulting status of resources γ_(β) isgiven by γ_(β)=γ₀+Σ_(a∈β)Γ(a). The set of satisfying action sequences isconstrained to sequences β=s₀a₁s₁ . . . a_(n)s_(n) such that at anystate s_(x)∈β and a truncation β′ of sequence β until this state s_(x),i.e. β′=s₀a₁ . . . a_(x)s_(x) it holds that γ_(β′,i)>0 for eachcomponent i=1, . . . , M. In other words, the action sequences β areconstrained such that the inequality constraint of the resources γ_(β)holds at any time during the execution of the action sequence β.

Note that it is also possible to express constraints of the fromγ_(β,i)≥0 within this framework by choosing a fixed offset ξ smallerthan the minimal change γ_(Δ,i) of the resource component γ_(β,i) underan exchange of any one action a_(j)∈

for any other action a_(k)∈

, i.e. γ_(Δ,i)=min_((a) _(j) _(,a) _(k) ₎|Γ(a_(j))_(i)−Γ(a_(k))_(i)|.The constraint γ_(β,i)≥0 can then be modeled as an equivalent inequalityconstraint γ_(β,i)+ξ>0.

While it would be possible to capture interval constraints of the formγ_(β,i)∈I=(I_(l),I_(u)) by a set of two inequality constraints, a morepreferred solution that introduces a smaller number of Pareto optimallabels as explained below is to remodel the interval constraint as

${\gamma_{\beta,I} - \frac{I_{u} - I_{l}}{2}} > 0$where

$\gamma_{\beta,I} = \left. ||{\frac{I_{u} - I_{l}}{2} + I_{l} - \gamma_{\beta,i}} \right.||$denotes a distance measure of γ_(β,i) from the center of the interval I.

The actual algorithm for the planning problem discussed above is basedon a label-setting approach which can be thought of as a multi-criteriageneralization of the Dijkstra shortest path search. Instead ofoperating on states with associated costs, the label-setting algorithmconstructs a set of labels for each state. For each state s∈

, a label l will be given as l=(ĉ_(β),γ_(β),v,i_(v)) which depends onthe action sequence β that led to state s, ĉ_(β) is the associatedcompressed cost and γ_(β) the associated resource status, v∈

is the state that precedes state s in action sequence β and i_(v) is therespective predecessor label.

In other words, the construction of such a multi-dimensional label lfore each state s is an extension of the team-model state space

to a higher-dimensional, infinitely large label space

, in which each label l∈

⊂

of state s instantiates one possible continuous resource configuration γand transitions between the labels are described by their predecessorrelations.

denotes the set of instantiated, i.e., feasible, labels at state s and

=

⊂

denotes the set of all feasible labels.

It is possible to model a resource constraint as a proposition π_(i),e.g., π_(i):=(γ_(β,i)>0). Whether or not π_(i) is true would, in thestate space

, depend on a full action sequence β. However, in label space

, π_(i) is either true or false for each element of the label space

since it is possible to associate a single label l∈

with a specific γ_(l,i)=γ_(β,i) as its second component. In a preferredembodiment, the resource constraints are indeed modeled in this way anddenote the corresponding set of resource constraint propositions withΠ_(γ).

The actual algorithm which is illustrated in FIG. 5 starts with aninitialization in step 2010. A set of temporary labels L_(t,v) isinitialized as L_(t,v)={0,γ₀, Ø, Ø} for each initial state v∈

. For each other state s∈

\

, a set of temporary labels L_(t,s) is initialized as L_(t,s)=Ø.Furthermore, for each state s∈

a set of permanent labels L_(p,s) is initialized as L_(p,s)=Ø.

In the following step 2020, it is checked whether the set of temporarylabels L_(t,s) is empty for each state s. If this is the case, no finalstate f is reachable and the algorithm stops with an error indication instep 2021, which may result in a controlling agents 11, 12, 13accordingly, e.g. by transitioning the control system 10 into a safestate.

If, however, it is determined that the set of temporary labels L_(t,s)is not empty for at least one state s, the method proceeds to step 2030.In step 2030, the compressed cost vector compressed cost vector ĉ_(β) iscomputed according to equation (1) and the compressed team cost{circumflex over (κ)}(ĉ_(β) ^((l))) is computed according to equation(2). This is possible since each label l specifies its predecessorlabel, and the action sequence β leading to label l can bereconstructed. Then, a minimizing state s* and a minimizing label l*from the corresponding set of temporary labels L_(t,s)* is determinedsuch that they minimize the compressed team cost {circumflex over (κ)},i.e.

(s^(*), l^(*)) = argmin_(s ∈ S_(𝒢), l ∈ L_(t, s))κ̂(ĉ_(β)^((l))).

In the next step 2040, the minimizing label l* is removed from the setof temporary labels L_(t,s), corresponding to the minimizing state s*,i.e. L_(t,s)*←L_(t,s)*\{l*} and added to the corresponding set ofpermanent labels L_(p,s)*, i.e. L_(p,s)*←L_(p,s)*∪, {l*}.

In the following step 2050, it is checked whether the minimizing states* is an accepting state. If this is the case, the method continues withstep 2060, if not, it continues with step 2080.

In step 2060, a final label l_(fin) is set to label l*. As outlinedabove, the corresponding final action sequence β_(fin) is reconstructediteratively from the predecessor labels. The final action sequenceβ_(fin) is the selected action sequence β with minimal compressed teamcosts {circumflex over (κ)} and hence minimal team costs κ. Thisconcludes the algorithm.

In step 2080, a set V of all neighboring states v of minimizing state s*and a corresponding set L_(v) of corresponding neighboring labels isdetermined. For example, V may be determined by intitializing V=Ø,L_(v)=Ø, exploring each state v=(r_(v),q_(v),s_(v))∈

and adding state v to set V if and only if there is an action a thatlinks the minimizing state s* to state v, i.e. a=(s*,v)∈

. If state v∈

is added to set V, the corresponding new costs ĉ_(new) are computeddepending on action a via

${\hat{c}}_{new} = \left\{ {\begin{matrix}\begin{pmatrix}{\left. ||{\hat{c}}_{1}^{(l)} \right.,\left. {\hat{c}}_{3}^{(l)} \right.||_{\infty}} \\{\left. ||{\hat{c}}_{2}^{(l)} \right.,\left. {\hat{c}}_{3}^{(l)} \right.||_{1}} \\0\end{pmatrix} & {{{if}\mspace{14mu} a} \in \zeta} \\{{\hat{c}}^{(l)} + \left( {0,0,{C(a)}} \right)^{T}} & {otherwise}\end{matrix}.} \right.$

Similarly, corresponding new resources γ_(new) are computed depending onaction a via

$\gamma_{new} = \left\{ {\begin{matrix}\begin{pmatrix}\gamma_{global}^{(l)} \\\gamma_{0,r_{v}}\end{pmatrix} & {{{if}\mspace{14mu} a} \in \zeta} \\{\gamma^{(l)} + {\Gamma(a)}} & {otherwise}\end{matrix}.} \right.$

In this formula, γ_(global) ^((l)) denotes the part of the resourcesγ^((l)) that is global, i.e. independent of the agent, and γ_(0,r) _(v)denote the initial resources of agent r_(v). A corresponding new labell_(v)=(ĉ_(new), γ_(new),s*,i_(s*)) is generated, withi_(s*)=card(L_(p,s*)). This corresponding new label l_(v) is then addedto the set of neighboring labels L_(V). After exploration of all statesv is completed, the method continues with step 2090.

In the next step 2090, it is checked it is checked for each neighboringlabel l∈L_(v) whether the corresponding new resource status γ_(new)satisfies all constraints. For this purpose, an extended transitionfunction Δ:

×

^(M)→{

,

} which is an extension of the transition function δ of thenondeterministic finite automaton

is defined as Δ:(a=((r_(s),q_(s),s_(s)),(r_(t),q_(t),s_(t))),γ)

(λ(s_(s))∪Π_(γ))

δ(q_(s), q_(t)). The action a_(l) associated with neighboring label l isdetermined and it is checked whether Δ(a_(l),γ_(new)) is true. If it isnot true, the method branches back to step 2020. If it is true, however,it is also checked whether the neighboring label l is non-dominated inthe Pareto sense.

For ease of notation, an operator <_(P) denotes a “less than”-relationin the Pareto sense, i.e. (a₁, . . . , a_(n))^(T)<_(P)(b₁, . . . ,b_(n))^(T)⇔a≠b∧a_(i)≤b_(i)∀i∈{1, . . . , n}. An operator ≤_(P) relaxesthis relation and also allows a=b. A label is non-dominated in thePareto sense if there does not exist another label

in either the set of temporary labels L_(t,s) or the set of permanentlabel L_(p,s) at the same state v such that (ĉ⁽

⁾,−γ⁽

⁾)≤_(P)(ĉ^((l)),−γ^((l))).

If it is found that no such label

exists, it is deemed that the neighboring label l is non-dominated inthe Pareto sense and the method continues with step 2100. If, however,such a label

exists, the method skips back to step 2020.

In step 2100, all labels

which are dominated by any neighboring label l∈L_(v) where saidneighboring label found to satisfy all constraints and be non-dominatedin the Pareto sense by another label are removed from the set oftemporary labels L_(t,v) at the same state v, i.e., L_(t,v)←L_(t,v)\{

∈L_(t,v):l<_(P)

}.

Next, in step 2110, all said aforementioned neighboring labels l areadded to the set or temporary labels L_(t,v), i.t. L_(t,v)←L_(t,v) ∪{l}.The method then continues with step 2020.

What is claimed is:
 1. A method for controlling a plurality of agents tocomplete a mission, comprising: deriving a decomposition set ofdecomposition states in a set of possible states of an automaton,wherein the automaton represents a specification of the missionincluding all tasks of the mission, the automaton being defined by atuple including: (i) the set of possible states, (ii) a set of initialstates, the set of initial states being a subset of the set of possiblestates, (iii) a set of Boolean formulas over a set of atomicpropositions, (iv) transition conditions, and (v) a set of acceptingstates, the set of accepting states being a subset of the set ofpossible states; deriving a sequence of actions to be carried out by theplurality of agents depending on the decomposition set, where each ofthe actions is to be carried out by at most one of the plurality ofagents; and providing a control signal for controlling the plurality ofagents in accordance with the derived sequence of actions.
 2. The methodaccording to claim 1, further comprising: controlling the plurality ofagents in accordance with the derived sequence of actions.
 3. The methodaccording to claim 1, further comprising: generating the decompositionset by exploring an essential sequence of an accepting run through oneor more candidate decomposition states.
 4. The method according to claim3, further comprising: adding the one or more candidate decompositionstate to the decomposition set depending on whether a complementarysequence to the explored essential sequence around the respective one ormore candidate decomposition state is accepting.
 5. The method accordingto claim 4, wherein in which the decomposition set includes all thosestates in the set of possible states of the automaton, for which thecomplementary sequence to the explored essential sequence around therespective state is accepting.
 6. The method according to claim 1,further comprising: generating a team model based on the automaton thatrepresents a specification of the mission and based on automata thateach model the capabilities of one of the plurality of agents.
 7. Themethod according to claim 6, wherein the team model comprises a set ofactions that comprises switch transitions which change the acting agentfrom one of the plurality of agents to another one of the plurality ofagents.
 8. The method according to claim 7, wherein the switchtransitions are configured to each change the acting agent from one ofthe plurality of agents to a next one of the plurality of agents.
 9. Themethod according to claim 8, wherein the switch transitions areconfigured such as to act only if the automaton that characterizes themission is in a decomposition state.
 10. The method according to claim6, further comprising: deriving the sequence of actions to be carriedout by the plurality of agents by a label-setting algorithm in whicheach state of a set of states of the team model is associated withlabels that are characterized by a sequence of action leading to therespective state.
 11. The method according to claim 10, furthercomprising: constructing a reachable set of temporary labels for eachstate and a set of permanent labels.
 12. The method according to claim11, further comprising: constructing, for each selected label, a set ofconsecutive labels by extending an action sequence associated with theselected label by all available actions and adding the resulting labelsto the reachable set of temporary labels.
 13. The method according toclaim 12, wherein each label comprises at least one component thatcharacterizes a cost under the corresponding sequence of actions. 14.The method according to claim 13, wherein the derived sequence ofactions to be carried out by the plurality of agents is the one out ofall actions that satisfy a characterization of the mission thatminimizes a team cost which depends on the component that characterizesthe cost.
 15. The method according to claim 14, wherein only actionsresulting in Pareto-optimal labels at their target state are added tothe reachable set of temporary labels.
 16. The method according to claim13, wherein the component that characterizes the cost under thecorresponding sequence of actions is depending on costs associated witheach of these actions with one component each for each one of theagents.
 17. The method according to claim 16, wherein the component thatcharacterizes the cost under the corresponding sequence of actions isstored in memory by way of a data structure that comprises at least onecomponent that characterizes costs associated with a selected one of theagents and at least one component that characterizes the costsassociated with a group of agents that precede the selected one of theagents.
 18. The method according to claim 17, wherein each labelcomprises at least one component that characterizes a resource status atthe respective state under the corresponding sequence of actions. 19.The method according to claim 18, wherein the characterization of themission comprises an inequality constraint that restricts the at leastone component that characterizes a resource status to a predefinedregion.
 20. A non-transitory machine-readable storage medium on which isstored a computer program for controlling a plurality of agents tocomplete a mission, the computer program, when executed by a computer,causing the computer to perform: deriving a decomposition set ofdecomposition states in a set of possible states of an automaton,wherein the automaton represents a specification of the missionincluding all tasks of the mission, the automaton being defined by atuple including: (i) the set of possible states, (ii) a set of initialstates, the set of initial states being a subset of the set of thepossible states, (iii) a set of Boolean formulas over a set of atomicpropositions, (iv) transition conditions, and (v) a set of acceptingstates, the set of accepting states being a subset of the set ofpossible states; deriving a sequence of actions to be carried out by theplurality of agents depending on the decomposition set, where each ofthe actions is to be carried out by at most one of the plurality ofagents; and controlling the plurality of agents in accordance with thederived sequence of actions.
 21. A system for controlling a plurality ofagents to complete a mission, which is configured to: derive adecomposition set of decomposition states in a set of possible states ofan automaton, wherein the automaton represents a specification of themission including all tasks of the mission, the automaton being definedby a tuple including: (i) the set of possible states, (ii) a set ofinitial states, the set of initial states being a subset of the set ofpossible states, (iii) a set of Boolean formulas over a set of atomicpropositions, (iv) transition conditions, and (v) a set of acceptingstates, the set of accepting states being a subset of the set ofpossible states; derive a sequence of actions to be carried out by theplurality of agents depending on the decomposition set, where each ofthe actions is to be carried out by at most one of the plurality ofagents; and control the plurality of agents in accordance with thederived sequence of actions.
 22. The system according to claim 21,further comprising at least one of the plurality of agents.